65 research outputs found

    A Non-Asymptotic Bandwidth Selection Method for Kernel Density Estimation of Discrete Data

    Get PDF
    In this paper we explore a method for modeling of categorical data derived from the principles of the Generalized Cross Entropy method. The method builds on standard kernel density estimation techniques by providing a novel non-asymptotic data-driven bandwidth selection rule. In addition to this, the Entropic approach provides model sparsity not present in the standard kernel approach. Numerical experiments with 10 dimensional binary medical data are conducted. The experiments suggest that the Generalized Cross Entropy approach is a viable method for density estimation, discriminant analysis and classification

    Monte Carlo Estimation of the Density of the Sum of Dependent Random Variables

    Full text link
    We study an unbiased estimator for the density of a sum of random variables that are simulated from a computer model. A numerical study on examples with copula dependence is conducted where the proposed estimator performs favourably in terms of variance compared to other unbiased estimators. We provide applications and extensions to the estimation of marginal densities in Bayesian statistics and to the estimation of the density of sums of random variables under Gaussian copula dependence

    Splitting Methods for Efficient Combinatorial Counting and Rare-Event Probability Estimation

    Get PDF
    This paper is divided into two major parts. In the first part we describe a new Monte Carlo algorithm for the consistent and unbiased estimation of multidimensional integrals and the efficient sampling from multidimensional densities. The algorithm is inspired by the classical splitting method and can be applied to general static simulation models. We provide examples from rare-event probability estimation, counting, optimization, and sampling, demonstrating that the proposed method can outperform existing Markov chain sampling methods in terms of convergence speed and accuracy. In the second part we present a new adaptive kernel density estimator based on linear diffusion processes. The proposed estimator builds on existing ideas for adaptive smoothing by incorporating information from a pilot density estimate. In addition, we propose a new plugin bandwidth selection method that is free from the arbitrary normal reference rules used by existing methods. We present simulation examples in which the proposed approach outperforms existing methods in terms of accuracy and reliability

    Kernel Density Estimation with Linked Boundary Conditions

    Get PDF
    Kernel density estimation on a finite interval poses an outstanding challenge because of the well-recognized bias at the boundaries of the interval. Motivated by an application in cancer research, we consider a boundary constraint linking the values of the unknown target density function at the boundaries. We provide a kernel density estimator (KDE) that successfully incorporates this linked boundary condition, leading to a non-self-adjoint diffusion process and expansions in non-separable generalized eigenfunctions. The solution is rigorously analyzed through an integral representation given by the unified transform (or Fokas method). The new KDE possesses many desirable properties, such as consistency, asymptotically negligible bias at the boundaries, and an increased rate of approximation, as measured by the AMISE. We apply our method to the motivating example in biology and provide numerical experiments with synthetic data, including comparisons with state-of-the-art KDEs (which currently cannot handle linked boundary constraints). Results suggest that the new method is fast and accurate. Furthermore, we demonstrate how to build statistical estimators of the boundary conditions satisfied by the target function without apriori knowledge. Our analysis can also be extended to more general boundary conditions that may be encountered in applications

    Column Subset Selection and Nystr\"om Approximation via Continuous Optimization

    Full text link
    We propose a continuous optimization algorithm for the Column Subset Selection Problem (CSSP) and Nystr\"om approximation. The CSSP and Nystr\"om method construct low-rank approximations of matrices based on a predetermined subset of columns. It is well known that choosing the best column subset of size kk is a difficult combinatorial problem. In this work, we show how one can approximate the optimal solution by defining a penalized continuous loss function which is minimized via stochastic gradient descent. We show that the gradients of this loss function can be estimated efficiently using matrix-vector products with a data matrix XX in the case of the CSSP or a kernel matrix KK in the case of the Nystr\"om approximation. We provide numerical results for a number of real datasets showing that this continuous optimization is competitive against existing methods

    Variance Reduction for Matrix Computations with Applications to Gaussian Processes

    Full text link
    In addition to recent developments in computing speed and memory, methodological advances have contributed to significant gains in the performance of stochastic simulation. In this paper, we focus on variance reduction for matrix computations via matrix factorization. We provide insights into existing variance reduction methods for estimating the entries of large matrices. Popular methods do not exploit the reduction in variance that is possible when the matrix is factorized. We show how computing the square root factorization of the matrix can achieve in some important cases arbitrarily better stochastic performance. In addition, we propose a factorized estimator for the trace of a product of matrices and numerically demonstrate that the estimator can be up to 1,000 times more efficient on certain problems of estimating the log-likelihood of a Gaussian process. Additionally, we provide a new estimator of the log-determinant of a positive semi-definite matrix where the log-determinant is treated as a normalizing constant of a probability density.Comment: 20 pages, 3 figure
    corecore